10 research outputs found

    Analysis of approximate nearest neighbor searching with clustered point sets

    Full text link
    We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan 15-16, 199

    On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions

    No full text
    In nearest neighbor searching we are given a set of n data points in real d-dimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. Given the limitation of linear storage, the best known data structures suer from expected-case query times that grow exponentially in d. However, it is widely regarded in practice that data sets in high dimensional spaces tend to consist of clusters residing in much lower dimensional subspaces. This raises the question of whether data structures for nearest neighbor searching adapt to the presence of lower dimensional clustering, and further how performance varies when the clusters are aligned with the coordinate axes. We analyze the popular kd-tree data structure in the form of two variants based on a modication of the splitting method, which produces cells satisfy the basic packing properties needed for eciency without producing empty cells. We show that when data points are uniformly distributed on a k- dimensional hyperplane for k d, then expected number of leaves visited in such a kd-tree grows exponentially in k, but not in d. We show that the growth rate is even smaller still if the hyperplane is aligned with the coordinate axes. We present empirical studies to support our theoretical results. Keywords: Nearest neighbor searching, kd-trees, splitting methods, expected-case analysis, clustering.

    DIMACS Series in Discrete Mathematics and Theoretical Computer Science Analysis of Approximate Nearest Neighbor Searching with Clustered Point Sets

    No full text
    Abstract. Nearest neighbor searching is a fundamental computational problem. A set of n data points is given in real d-dimensional space, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are primarily interested in data structures that use only O(dn) storage. A popular class of data structures for nearest neighbor searching is the kd-tree and variants based on hierarchically decomposing space into rectangular cells. An important question in the construction of such data structures is the choice of a splitting method, which determines the dimension and splitting plane to be used at each stage of the decomposition. This choice of splitting method can have a significant influence on the efficiency of the data structure. This is especially true when data and query points are clustered in low dimensional subspaces. This is because clustering can lead to subdivisions in which cells have very high aspect ratios. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching

    It’s okay to be skinny, if your friends are fat

    No full text
    The kd-tree is a popular and simple data structure for range searching and nearest neighbor searching. Such a tree subdivides space into rectangular cells through the recursive application of some splitting rule. The choice of splitting rule affects the shape of cells and the structure of the resulting tree. It has been shown that an important element in achieving efficient query times for approximate queries is that each cell should be fat, meaning that the ratio of its longest side to shortest side (its aspect ratio) should be bounded. Subdivisions with fat cells satisfy a property called the packing constraint, which bounds the number of disjoint cells of a given size that can overlap a ball of a given radius. We consider a splitting rule called the sliding-midpoint rule. It has been shown to provide efficient search times for approximate nearest neighbor and range searching, both in practice and in terms of expected case query time. However it has not been possible to prove results about this tree because it can produce cells of unbounded aspect ratio. We show that in spite of this, the sliding-midpoint rule generates subdivisions that satisfy the packing constraint, thus explaining their good performance.

    Expressing anger and joy with size code

    No full text
    ABSTRACT This paper reports our finding of the use of a proposed biological code -the size code in anger and joy speech. In searching for explanations for an F0 peak delay phenomenon related to angry speech that cannot be accounted for by known articulatory constraints, we hypothesized that the delay was due to the lowering of the larynx to exaggerate body size, a biological code known to be used by animals. Our analysis of the formant frequencies in existing emotional speech databases revealed that anger speech had lowered formants and joy speech had raised formants. The results confirm our hypothesis and suggest that the size code is being actively used by humans to express emotions
    corecore